{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Statistical Plots"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In addition to the plots available via the plot interface, hvPlot makes a number of more sophisticated, statistical plots available that are modelled on ``pandas.plotting``. To explore these, we will load the iris and stocks datasets from Bokeh:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import hvplot.pandas  # noqa\n",
    "\n",
    "from bokeh.sampledata import iris, stocks \n",
    "\n",
    "iris = iris.flowers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scatter Matrix\n",
    "\n",
    "When working with multi-dimensional data, it is often difficult to understand the relationship between all the different variables. A ``scatter_matrix`` makes it possible to visualize all of the pairwise relationships in a compact format. ``hvplot.scatter_matrix`` is closely modelled on ``pandas.plotting.scatter_matrix``:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hvplot.scatter_matrix(iris, c=\"species\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compared to a static Seaborn/Matplotlib-based plot, here it is easy to explore the data interactively thanks to Bokeh's linked zooming, linked panning, and linked brushing (using the ``box_select`` and ``lasso_select`` tools).\n",
    "\n",
    "### Parallel Coordinates\n",
    "\n",
    "Parallel coordinate plots provide another way of visualizing multi-variate data. ``hvplot.parallel_coordinates`` provides a simple API to create such a plot, modelled on the API of `pandas.plotting.parallel_coordinates()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hvplot.parallel_coordinates(iris, \"species\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The plot quickly clarifies the relationship between different variables, highlighting the difference of the \"setosa\" species in the petal width and length dimensions.\n",
    "\n",
    "### Andrews Curves\n",
    "\n",
    "\n",
    "Another similar approach is to visualize the dimensions using Andrews curves, which are constructed by generating a Fourier series from the features of each observation, visualizing the aggregate differences between classes. The ``hvplot.andrews_curves()`` function provides a simple API to generate Andrews curves from a datafrom, closely matching the API of ``pandas.plotting.andrews_curves()``:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "hvplot.andrews_curves(iris, \"species\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once again we can see the significant difference of the setosa species. However, unlike the parallel coordinate plot, the Andrews plot does not give any real quantitative insight into the features that drive those differences.\n",
    "\n",
    "### Lag Plot\n",
    "\n",
    "Lastly, for the analysis of time series hvplot offers a so called lag plot, implemented by the ``hvplot.lag_plot()`` function, modelled on the matching ``pandas.plotting.lag_plot()`` function.\n",
    "\n",
    "As an example we will compare the closing stock prices of Apple and IBM from 2000-2013 using a lag of 365 days:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index = pd.DatetimeIndex(stocks.AAPL['date'])\n",
    "stock_df = pd.DataFrame({'IBM': stocks.IBM['close'], 'AAPL': stocks.AAPL['close']}, index=index)\n",
    "\n",
    "hvplot.lag_plot(stock_df, lag=365, alpha=0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Using this plot it becomes apparent that Apple was significantly more volatile over the analyzed time scale. In other words, its price at a particular point in time sometimes differed significantly from the price 365 days in the past. This also becomes visible in a simple line chart of the same data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "stock_df.hvplot.line()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "These plot types can help you make sense of complex datasets.  See [holoviews.org](https://holoviews.org) for many other plots and tools that can be used alongside those from hvPlot for other purposes."
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
